Exploiting the LDC Chinese-English Bilingual Wordlist for Cross Language Information Retrieval

نویسنده

  • Kui-Lam Kwok
چکیده

We investigated using the LDC English/Chinese bilingual wordlists for English-Chinese cross language retrieval. It is shown that the Chinese-to-English wordlist can be considered as both a phrase and word dictionary, and is preferable to the English-to-Chinese version in terms of phrase translation and word translation selection. Additional techniques such as frequency-based term selection, translation set weighting and term co-occurrence data were employed. Experiments show that within the TREC 5&6 Chinese corpus and retrieval environment, 74% of monolingual effectiveness is achievable for short queries of a few English words, and 85% for long queries of paragraph sizes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

English-Chinese CLIR using a Simplified PIRCS System

A GUI is presented with our PIRCS retrieval system for supporting English-Chinese cross language information retrieval. The query translation approach is employed using the LDC bilingual wordlist. Given an English query, different translation methods and their retrieval results can be demonstrated.

متن کامل

TREC-9 Cross Language, Web and Question-Answering Track Experiments using PIRCS

In TREC-9, we participated in the English-Chinese Cross Language, 10GB Web data ad-hoc retrieval as well as the Question-Answering tracks, all using automatic procedures. All these tracks were new for us. For Cross Language track, we made use of two techniques of query translation: MT software and bilingual wordlist lookup with disambiguation. The retrieval lists from them were then combined as...

متن کامل

Research on Lucene-based English-Chinese Cross-Language Information Retrieval

In this paper, we present our English-Chinese Cross-Language Information Retrieval (CLIR) system. We focus our attention on finding effective translation equivalents between English and Chinese, and improving the performance of Chinese IR. On English-Chinese CLIR, we adopt query translation as the dominant strategy, and utilize English-Chinese bilingual dictionary as the important knowledge res...

متن کامل

English-Chinese Cross-Language Information Retrieval using Lucene Toolkit1

In this paper, we present our English-Chinese Cross-Language Information Retrieval (CLIR) system. We focus our attention on finding effective translation equivalents between English and Chinese, and improving the performance of Chinese IR. On English-Chinese CLIR, we adopt query translation as the dominant strategy, and utilize English-Chinese bilingual dictionary as the important knowledge res...

متن کامل

English-Chinese Cross-Language IR Using Bilingual Dictionaries

This report describes the English-Chinese crosslanguage experiments at Berkeley for TREC-9 CrossLanguage Information Retrieval track. We present a simple and effective Chinese word segmentation method and compare the cross-language retrieval performance of two bilingual dictionaries for query translation.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. J. Comput. Proc. Oriental Lang.

دوره 14  شماره 

صفحات  -

تاریخ انتشار 2001